Open-Ended Visual Question-Answering

نویسندگان

Issey Masuda

Santiago Pascual de la Puente

Xavier Giró

چکیده

This thesis studies methods to solve Visual Question-Answering (VQA) tasks with a Deep Learning framework. As a preliminary step, we explore Long Short-Term Memory (LSTM) networks used in Natural Language Processing (NLP) to tackle Question-Answering (text based). We then modify the previous model to accept an image as an input in addition to the question. For this purpose, we explore the VGG-16 and K-CNN convolutional neural networks to extract visual features from the image. These are merged with the word embedding or with a sentence embedding of the question to predict the answer. This work was successfully submitted to the Visual Question Answering Challenge 2016, where it achieved a 53,62% of accuracy in the test dataset. The developed software has followed the best programming practices and Python code style, providing a consistent baseline in Keras for different configurations. The source code and models are publicly available at https://github.com/imatge-upc/vqa-2016-cvprw.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Proposing Plausible Answers for Open-ended Visual Question Answering

Answering open-ended questions is an essential capability for any intelligent agent. One of the most interesting recent open-ended question answering challenges is Visual Question Answering (VQA) which attempts to evaluate a system’s visual understanding through its answers to natural language questions about images. There exist many approaches to VQA, the majority of which do not exhibit deepe...

متن کامل

Video Question Answering via Hierarchical Spatio-Temporal Attention Networks

Open-ended video question answering is a challenging problem in visual information retrieval, which automatically generates the natural language answer from the referenced video content according to the question. However, the existing visual question answering works only focus on the static image, which may be ineffectively applied to video question answering due to the lack of modeling the tem...

متن کامل

Highway Networks for Visual Question Answering

We propose a version of highway network designed for the task of Visual Question Answering. We take inspiration from recent success of Residual Layer Network and Highway Network in learning deep representation of images and fine grained localization of objects. We propose variation in gating mechanism to allow incorporation of word embedding in the information highway. The gate parameters are i...

متن کامل

Bilinear Pooling and Co-Attention Inspired Models for Visual Question Answering

In recent years, open-ended visual question answering has been an area of active research. In this work, we present our exploration of two state-of-art architectures including the Multi-modal Compact Bi-linear Pooling (MCB) and Dynamic Memory Network (DMN) and analysis of the result and performance of the models. We found both models to perform comparably on the VQA v2.0 dataset based on predic...

متن کامل

Deep Learning for Visual Question Answering

This project deals with the problem of Visual Question Answering (VQA). We develop neural network-based models to answer open-ended questions that are grounded in images. We used the newly released VQA dataset (with about 750K questions) to carry out our experiments. Our model makes use of two popular neural network architecture: Convolutional Neural Nets (CNN) and Long Short Term Memory Networ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1610.02692 شماره

صفحات -

تاریخ انتشار 2016

Open-Ended Visual Question-Answering

نویسندگان

چکیده

منابع مشابه

Proposing Plausible Answers for Open-ended Visual Question Answering

Video Question Answering via Hierarchical Spatio-Temporal Attention Networks

Highway Networks for Visual Question Answering

Bilinear Pooling and Co-Attention Inspired Models for Visual Question Answering

Deep Learning for Visual Question Answering

عنوان ژورنال:

اشتراک گذاری